Neural network speech processing for toys and consumer electronics
The ongoing challenge in speech research is recognizing continuous,
unconstrained speech. In comparison, isolated word recognition with small
vocabularies is easy. Many commercial efforts are aimed at the high-end
problem. Sensory, Inc. has successfully focused on the low end, producing
a family of low-cost speech recognition chips for toys, consumer electronics,
electronic learning aids, and home appliances. The chips are based on a 4
MIPS 8-bit microcontroller with on board AGC, A/D, D/A, and digital
filtering. The microcontroller can be programmed for speaker-independent or
dependent recognition, voice verification (recognizing a stored password
spoken by particular speaker), polyphonic music synthesis, speech synthesis,
voice record and playback, and has enough power to drive and communicate with
the application product. The speech recognition and voice verification
products are neural-network based. Speaker-independent recognition of
up-to-10 word vocabularies achieves accuracies of 95-98%. Speaker-dependent
recognition of vocabularies of up-to-60 items has an accuracy greater than
99%. The chip can be programmed to handle larger vocabularies by
context-dependent switching of recognition sets. The neural net architectures
are fairly standard, but the hardware and real-world usage impose some
interesting challenges, including speed constraints which necessitate integer
arithmetic, very limited RAM, and on-line speaker adaptation.
Retrieve Paper (postscript)